129 research outputs found

    A survey of logical models for OLAP databases

    Full text link

    MDM: governing evolution in big data ecosystems

    Get PDF
    On-demand integration of multiple data sources is a critical requirement in many Big Data settings. This has been coined as the data variety challenge, which refers to the complexity of dealing with an heterogeneous set of data sources to enable their integrated analysis. In Big Data settings, data sources are commonly represented by external REST APIs, which provide data in their original format and continously apply changes in their structure (i.e., schema). Thus, data analysts face the challenge to integrate such multiple sources, and then continuosly adapt their analytical processes to changes in the schema. To address this challenges, in this paper, we present the Metadata Management System, shortly MDM, a tool that supports data stewards and analysts to manage the integration and analysis of multiple heterogeneous sources under schema evolution. MDM adopts a vocabulary-based integration-oriented ontology to conceptualize the domain of interest and relies on local-as-view mappings to link it with the sources. MDM provides user-friendly mechanisms to manage the ontology and mappings. Finally, a query rewriting algorithm ensures that queries posed to the ontology are correctly resolved to the sources in the presence of multiple schema versions, a transparent process to data analysts. On-site, we will showcase using real-world examples how MDM facilitates the management of multiple evolving data sources and enables its integrated analysis.Peer ReviewedPostprint (published version

    An integration-oriented ontology to govern evolution in big data ecosystems

    Get PDF
    Big Data architectures allow to flexibly store and process heterogeneous data, from multiple sources, in their original format. The structure of those data, commonly supplied by means of REST APIs, is continuously evolving. Thus data analysts need to adapt their analytical processes after each API release. This gets more challenging when performing an integrated or historical analysis. To cope with such complexity, in this paper, we present the Big Data Integration ontology, the core construct to govern the data integration process under schema evolution by systematically annotating it with information regarding the schema of the sources. We present a query rewriting algorithm that, using the annotated ontology, converts queries posed over the ontology to queries over the sources. To cope with syntactic evolution in the sources, we present an algorithm that semi-automatically adapts the ontology upon new releases. This guarantees ontology-mediated queries to correctly retrieve data from the most recent schema version as well as correctness in historical queries. A functional and performance evaluation on real-world APIs is performed to validate our approach.Peer ReviewedPostprint (author's final draft

    Graph-driven federated data management (extended abstract)

    Get PDF
    Modern data analysis applications require the ability to provide on-demand integration of data sources while offering a user-friendly query interface. Traditional methods for answering queries using views, focused on a rather static setting, fail to address such requirements. To overcome these issues, we propose a full fledged, GLAV-based data integration approach based on graph-based constructs. The extensibility of graphs allows us to extend the traditional framework for data integration with view definitions. Furthermore, we also propose a query language based on subgraphs. We tackle query answering via a query rewriting algorithm based on well-known algorithms for answering queries using views. We experimentally show that our method yields good performance with no significant overhead.Sergi Nadal is partly supported by the Spanish Ministerio de Ciencia e Innovacion, as well as the European Union - NextGenerationEU, under project FJC2020-045809-I / AEI/10.13039/501100011033.Peer ReviewedPostprint (author's final draft

    Cohesion-Driven Decomposition of Service Interfaces without Access to Source Code

    Get PDF
    International audience—Software cohesion concerns the degree to which the elements of a module belong together. Cohesive software is easier to understand, test and maintain. In the context of service-oriented development, cohesion refers to the degree to which the operations of a service interface belong together. In the state of the art, software cohesion is improved based on refactoring methods that rely on information, extracted from the software implementation. This is a main limitation towards using these methods in the case of Web services: Web services do not expose their implementation; instead all that they export is the Web service interface specification. To deal with this problem, we propose an approach that enables the cohesion-driven decomposition of service interfaces, without information on how the services are implemented. Our approach progressive decomposes a given service interface into more cohesive interfaces; the backbone of the approach is a suite of cohesion metrics that rely on information, extracted solely from the specification of the service interface. We validate the approach in 22 real-world services, provided by Amazon and Yahoo. We assess the effectiveness of the proposed approach, concerning the cohesion improvement, and the number of interfaces that result from the decomposition of the examined interfaces. Moreover, we show the usefulness of the approach in a user study, where developers assessed the quality of the produced interfaces

    Mining Service Abstractions (NIER Track)

    Get PDF
    International audienceSeveral lines of research rely on the concept of service abstractions to enable the organization, the composition and the adaptation of services. However, what is still missing, is a systematic approach for extracting service abstractions out of the vast amount of services that are available all over theWeb. To deal with this issue, we propose an approach for mining service abstractions, based on an agglomerative clustering algorithm. Our experimental ndings suggest that the approach is promising and can serve as a basis for future research

    Cohesion-Driven Decomposition of Service Interfaces Without Access to Source Code

    Get PDF
    Software cohesion concerns the degree to which the elements of a module belong together. Cohesive software is easier to understand, test and maintain. Improving cohesion is the target of several refactoring methods that have been proposed until now. These methods are tailored to operate by taking the source code into consideration. In the context of service-oriented development, cohesion refers to the degree to which the operations of a service interface belong together. In this context, we propose an approach for the cohesion-driven decomposition of service interfaces. The very philosophy of services dictates that all that is exported by a service is the service specification. Hence, our approach for the cohesion-driven decomposition of service interfaces is not based on how the services are implemented. Instead, it relies only on information provided in the specification of the service interfaces. We validate the approach in 22 real-world services provided by Amazon and Yahoo. We show the effectiveness of the proposed approach, concerning the cohesion improvement and the size of the produced decompositions. Moreover, we show that the proposed approach is useful, by conducting a user study, where developers assessed the quality of the produced decompositions
    corecore